This project aims to explore how social norms and community
interactions influence crime rates. By examining the social fabric and
relational dynamics within communities, we gain deeper insights into
crime that go beyond environmental or individual factors.
Focusing
on the sociological perspective, this study investigates the
relationship between community cohesion, social norms, and crime rates,
highlighting the impact of social structures and collective behaviors on
criminal activity.
The primary objective is to explore the connections between community cohesion and crime dynamics. This involves analyzing how variations in community cohesion correlate with crime rates and how social structures within communities contribute to these patterns. The study aims to use a Bayesian Hierarchical Model to account for different levels of social interactions, from individual to community-wide scales, to better understand these relationships.
The crime dataset used in this project is obtained from the UCI Machine Learning Repository, specifically the Communities and Crime dataset. This dataset is fetched using Python as per the provided instructions on the UCI website and then uploaded to a platform suitable for analysis in R.
The dataset contains 128 variables chosen for their potential link to crime, including community characteristics and law enforcement metrics, and includes the following types of data:
Social Cohesion Indicators : Data on community engagement, participation in local events, sense of community, and social trust derived from surveys.
Socio-economic Data : Information on income distribution, educational attainment, and employment rates within communities.
Crime Statistics : Detailed crime reports categorized by type and intensity, including data on locations and times
The target variable, Per Capita Violent Crimes,
was calculated using population data and the sum of violent crimes
(murder, rape, robbery, and assault). Due to inconsistencies in rape
counts, some cities, mainly from the Midwestern USA, were excluded. All
numeric data were normalized to a 0.00-1.00 range using an unsupervised,
equal-interval binning method, preserving the distribution and skew of
each attribute but not the relationships between different attributes.
Extreme values more than 3 standard deviations from the mean were capped
at 1.00 or 0.00.
Due to time and memory constraints, the following variables were selected for this project:
Social Cohesion Indicators Teen_2Par,YoungKids_2Par, Families_2Parents, Large_Families, Working_mom, Illegitimate_Births
Socio-economic Data Median_Income, Employed, Unemployed, Below_Poverty, Degree_BS_Or_More, Inc_from_inv, Poor_English, Welfare_Public_Assist
The initial step of this project involved performing Exploratory Data Analysis (EDA) to understand the structure and distribution of the data, identify patterns, and detect any anomalies or outliers. This analysis provided valuable insights and helped in making informed decisions for data preprocessing and model building.
Part of the preprocessing of the dataset done in Google Colab using
Python (SDS2_preprocessing.ipynb) ensured the selection of
an appropriate number of variables and that the data was clean,
consistent, and suitable for further analysis.
The resulting reduced
dataset, which will be used for the project, includes the following
variables:
As a first step zeros and ones were removed in both the target and
other variables to ensure data consistency, going from 1994 observations
to 1038.
After this transformation Normal Q-Q Plot and
Residuals vs Fitted values were plotted to check if the
distribution meets normality and homoscedasticity assumptions:
The Normal Q-Q plot shows that residuals deviate from the reference line, particularly at the tails, suggesting some non-normality that might impact model assumptions. In the Residuals vs. Fitted Values plot, the residuals are scattered around zero, indicating general support for homoscedasticity, although there is slight variation across the fitted values. This variation could signal minor heteroscedasticity.
Applying a Box-Cox transformation can help mitigate these issues by making the residuals closer to normal and stabilizing their variance. This transformation improves the model’s overall fit and makes parameter estimates more reliable, potentially leading to more accurate predictions:
\(y(\lambda) = \begin{cases} \frac{y^{\lambda} - 1}{\lambda} & \text{if } \lambda \neq 0 \\ \log(y) & \text{if } \lambda = 0 \end{cases}\)
where:
The Normal Q-Q plot and Residuals vs. Fitted Values plot after the Box-Cox transformation show clear improvements. In the Q-Q plot, residuals align more closely with the reference line, especially in the middle, indicating a closer-to-normal distribution. Minor deviations remain at the tails but are less severe than before. The Residuals vs. Fitted plot now shows a more consistent spread around zero, with no evident pattern, supporting homoscedasticity.
Overall, these improvements suggest that the Box-Cox transformation has helped the model better meet normality and constant variance assumptions, enhancing its reliability and predictive robustness.
Subsequentially and Boxplots were plotted for each variable to visualize the distribution:
Histograms
Boxplots
Overall, the histograms and boxplots demonstrate significant skewness
in several variables, like Large_Families,
Poor_English, Welfare_Public_Assist,
Below_Poverty, and Illegitimate_Births,
Speak_Eng_Only. Applying a normalization or scaling could
help reduce the imbalance in the distribution.
## Variable Mean SD
## YoungKids_2Par YoungKids_2Par -1.381356e-16 1
## Teen_2Par Teen_2Par -2.681555e-16 1
## Employed Employed 4.829954e-17 1
## Below_Poverty Below_Poverty -7.031330e-17 1
## Degree_BS_Or_More Degree_BS_Or_More -9.782277e-17 1
## Inc_from_inv Inc_from_inv 4.600025e-17 1
## Speak_Eng_Only Speak_Eng_Only 2.401272e-16 1
## Illegitimate_Births Illegitimate_Births -3.290319e-17 1
## Large_Families Large_Families -9.937058e-17 1
## Poor_English Poor_English 1.353581e-17 1
## Families_2Parents Families_2Parents 1.789814e-17 1
## Working_mom Working_mom 3.087941e-16 1
## Median_Income Median_Income 1.688483e-16 1
## Unemployment Unemployment -1.188878e-16 1
## Welfare_Public_Assist Welfare_Public_Assist 1.084912e-16 1
Now we can have a look at how the distribution changed after the normalization:
Histograms
Boxplots
State vs target
Then, it was helpful to also investigate the relationship between the
target and the State variables, the
categorical stratifying variable:
From the plot, states like Arizona,
Michigan, and Pennsylvania show higher median
crime rates (in red), suggesting that these areas might experience
socio-economic or cultural factors that contribute to higher incidences
of crime.
In contrast, states like Montana, Wyoming,
and Vermont, indicated in green, have lower median crime
rates. These states might benefit from stronger community cohesion,
effective law enforcement, or other socio-economic factors that mitigate
crime.
The plot hints at a potential influence of climate on crime rates.
For example, states with harsher winters (like Vermont and
Wyoming) might have lower crime rates, supporting the CLASH
model’s theory that significant seasonal variation promotes
future-oriented behaviors and self-control. Conversely, states with
milder climates and less seasonal variation might experience higher
crime rates due to reduced need for long-term planning and increased
impulsivity.
This theory posits that consistent climates with less
variation require less future planning, leading to a “faster” life
strategy characterized by present-focused behaviors and reduced
self-control, which can contribute to higher rates of aggression and
violence. 1
Correlation plot
Then, to gain a clearer understanding of the relationships between various socio-economic factors and their impact on crime rates, a correlation plot was employed. This visual representation helps to identify significant positive and negative correlations within the dataset, providing a foundation for more detailed analysis.
The correlation plot reveals several key insights. There are strong
positive relationships between Families_2Parents,
Kids_2Parents and Teen_2Par, indicating
communities with a high percentage of two-parent families also have many
kids and teens in such households. Similarly, higher educational
attainment (Degree_BS_Or_More) correlates with higher
Inc_from_inv, reflecting that individuals with higher
education levels are likely to have more investment-related income,
which aligns with general socio-economic trends.. Another notable
positive correlation exists between YoungKids_2Par and
Teen_2Par, suggesting consistency in family structures.
Another notable positive correlation exists between
YoungKids_2Par and Teen_2Par, suggesting
consistency in family structures where households with younger children
are also likely to have teenagers, indicating stable family
environments.
Conversely, significant negative correlations are observed between
Below_Poverty and Median_Income,
Employed, and Families_2Parents, indicating
that higher income, employment rates, and stable family structures are
associated with lower poverty levels.
Regarding crime rates (target), there are positive
correlations with Below_Poverty, Unemployment,
Welfare_Public_Assist, and
Illegitimate_Births, suggesting that higher levels of
poverty, unemployment, reliance on public assistance, and instances of
illegitimate births are linked to increased crime rates. In contrast,
negative correlations between target and variables like
Degree_BS_Or_More, Median_Income, and
Employed show that higher education, income, and employment
levels are associated with lower crime rates, reflecting the
socio-economic benefits of stability and education.
Other notable correlations include a positive relationship between
Poor_English and Large_Families, which could
indicate that families with language barriers tend to have more
children. However the relationship between
Illegitimate_Births and Below_Poverty appears
to be neutral or slightly positive, highlighting potential
socio-economic challenges but not a strong inverse relationship.
In this section of the project, we will employ a Hierarchical Bayesian Model to analyze the relationships between various socio-economic factors and crime rates. Hierarchical Bayesian models are particularly powerful for this type of analysis because they allow us to account for both fixed effects and random effects, making them ideal for data that is grouped or nested, such as our data which is grouped by states.
A hierarchical Bayesian model includes both fixed effects, which
represent overall effects estimated across all groups, and random
effects, which account for variations within each group. In this
context, our fixed effects include socio-economic and demographic
variables such as YoungKids_2Par, Teen_2Par,
Employed, Below_Poverty,
Degree_BS_Or_More, Inc_from_inv,
Speak_Eng_Only, Illegitimate_Births,
Large_Families, Poor_English,
Families_2Parents, Working_mom,
Median_Income, Unemployment, and
Welfare_Public_Assist. The random effects are represented
by the State variable, allowing the relationship between
these predictors and the crime rates to vary across different
states.
Given that the target variable is the percentage of crime rates per 100,000 people, a continuous variable between 0 and 1, we have chosen the beta distribution for our response variable. The beta family is well-suited for modeling proportions and rates constrained within the 0 to 1 interval.
We have selected weakly informative priors for our model to incorporate some prior knowledge while still allowing the data to inform the posterior estimates significantly. Specifically:
Normal(0, 1) prior for the fixed effects coefficients (class = “b”). This prior assumes that the coefficients are normally distributed with a mean of 0 and a standard deviation of 1, reflecting an assumption that most effects are small but allowing for the possibility of larger effects.
Gamma(1, 0.01) prior for the phi parameter, which controls the dispersion of the beta distribution for each observation. This choice mitigates the risk of extremely small values, ensuring a more stable estimation.
Normal(0, tau_state) prior for the random effects associated with states. This prior captures the variability across states while maintaining a focus on the overall mean effect.
Gamma(1, 1) prior for the standard deviation of the random effects (class = “sd”). This prior is selected for its ability to maintain positive values, reflecting the inherent property of standard deviations.
We first started with a basic hierarchical model where the target variable is rescaled between 0.001 and 0.999 so that it can be used for the beta model and it won’t affect much the results:
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 1038
## Unobserved stochastic nodes: 1099
## Total graph size: 30351
##
## Initializing model
##
## Iterations = 2001:7000
## Thinning interval = 1
## Number of chains = 3
## Sample size per chain = 5000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## beta[1] 0.142625 0.07702 0.0006289 0.0119213
## beta[2] -0.023163 0.04214 0.0003441 0.0030536
## beta[3] -0.011459 0.03114 0.0002543 0.0018405
## beta[4] 0.011208 0.02856 0.0002332 0.0016777
## beta[5] -0.051088 0.03992 0.0003260 0.0028840
## beta[6] 0.056756 0.02770 0.0002262 0.0017020
## beta[7] -0.118201 0.03868 0.0003158 0.0027632
## beta[8] -0.069170 0.03759 0.0003069 0.0025387
## beta[9] 0.182970 0.02611 0.0002132 0.0011002
## beta[10] 0.146736 0.02504 0.0002044 0.0011668
## beta[11] -0.082141 0.03818 0.0003117 0.0026018
## beta[12] -0.463894 0.05042 0.0004117 0.0045022
## beta[13] -0.072960 0.01901 0.0001552 0.0007699
## beta[14] -0.013796 0.03735 0.0003050 0.0025286
## beta[15] 0.009024 0.03022 0.0002468 0.0016209
## beta[16] -0.054582 0.03840 0.0003135 0.0025329
## sd_state 0.436995 0.05468 0.0004464 0.0012503
## state_effect[1] 0.863574 0.23218 0.0018957 0.0095811
## state_effect[2] -0.122947 0.12887 0.0010522 0.0067611
## state_effect[3] 0.338603 0.14635 0.0011950 0.0063789
## state_effect[4] 0.309750 0.09447 0.0007714 0.0098133
## state_effect[5] -0.010021 0.11830 0.0009659 0.0074396
## state_effect[6] -0.186314 0.10455 0.0008536 0.0087021
## state_effect[7] 0.326144 0.25677 0.0020965 0.0039380
## state_effect[8] 0.680644 0.10433 0.0008518 0.0080472
## state_effect[9] -0.092510 0.10945 0.0008936 0.0098253
## state_effect[10] -0.074593 0.13250 0.0010818 0.0070235
## state_effect[11] -0.207516 0.11714 0.0009564 0.0074116
## state_effect[12] 0.362456 0.15821 0.0012918 0.0056252
## state_effect[13] 0.250807 0.25477 0.0020802 0.0041882
## state_effect[14] 0.260817 0.10641 0.0008689 0.0088897
## state_effect[15] 0.188518 0.23150 0.0018902 0.0130610
## state_effect[16] -0.742247 0.12912 0.0010543 0.0068784
## state_effect[17] 0.394849 0.15706 0.0012824 0.0061445
## state_effect[18] 0.316471 0.09926 0.0008104 0.0090530
## state_effect[19] 0.105741 0.22273 0.0018186 0.0105059
## state_effect[20] -0.841796 0.13841 0.0011301 0.0063628
## state_effect[21] 0.045336 0.10188 0.0008319 0.0091516
## state_effect[22] 0.063865 0.14839 0.0012116 0.0072430
## state_effect[23] -0.261969 0.25669 0.0020958 0.0115086
## state_effect[24] 0.062172 0.09146 0.0007468 0.0089778
## state_effect[25] 0.346816 0.16736 0.0013665 0.0066203
## state_effect[26] -0.236581 0.11371 0.0009285 0.0074426
## state_effect[27] 0.257129 0.09991 0.0008157 0.0088768
## state_effect[28] -0.775080 0.17329 0.0014149 0.0055332
## state_effect[29] -0.357501 0.09329 0.0007617 0.0080934
## state_effect[30] -0.001848 0.10539 0.0008605 0.0085313
## state_effect[31] -0.217137 0.11936 0.0009746 0.0067123
## state_effect[32] -0.338300 0.11139 0.0009095 0.0080844
## state_effect[33] -0.306962 0.14452 0.0011800 0.0073320
## state_effect[34] 0.753150 0.16392 0.0013384 0.0080032
## state_effect[35] 0.100412 0.18963 0.0015483 0.0057947
## state_effect[36] 0.503086 0.13017 0.0010628 0.0063104
## state_effect[37] 0.264522 0.09414 0.0007686 0.0083274
## state_effect[38] -0.626753 0.13656 0.0011150 0.0078959
## state_effect[39] -0.846335 0.23877 0.0019496 0.0056908
## state_effect[40] -0.228931 0.11548 0.0009429 0.0087395
## state_effect[41] -0.181061 0.09398 0.0007674 0.0090583
## state_effect[42] -0.075257 0.14665 0.0011974 0.0059294
## state_effect[43] -0.277982 0.11356 0.0009272 0.0083090
## state_effect[44] -0.244594 0.20419 0.0016672 0.0050256
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## beta[1] -0.018799 0.1001621 0.147061 0.189863 0.281527
## beta[2] -0.105378 -0.0507491 -0.023931 0.005683 0.060107
## beta[3] -0.072141 -0.0330122 -0.011472 0.009907 0.048618
## beta[4] -0.043092 -0.0082842 0.010673 0.030051 0.069169
## beta[5] -0.134946 -0.0767697 -0.049902 -0.024045 0.024989
## beta[6] 0.002927 0.0381378 0.056762 0.074853 0.111743
## beta[7] -0.196178 -0.1441539 -0.117399 -0.091953 -0.043154
## beta[8] -0.142824 -0.0944452 -0.069384 -0.043473 0.004489
## beta[9] 0.132943 0.1644463 0.183100 0.201500 0.233058
## beta[10] 0.096911 0.1296280 0.147129 0.164150 0.194497
## beta[11] -0.155616 -0.1084727 -0.082775 -0.055425 -0.007689
## beta[12] -0.556985 -0.4993849 -0.464828 -0.431384 -0.359253
## beta[13] -0.109970 -0.0857433 -0.072769 -0.060341 -0.036382
## beta[14] -0.091364 -0.0375376 -0.012436 0.011947 0.055458
## beta[15] -0.048889 -0.0119937 0.009157 0.029886 0.067375
## beta[16] -0.129895 -0.0803166 -0.055667 -0.029466 0.023380
## sd_state 0.343934 0.3992818 0.431892 0.469925 0.559048
## state_effect[1] 0.351007 0.7265441 0.877500 1.017273 1.287512
## state_effect[2] -0.373925 -0.2080896 -0.125836 -0.040166 0.139868
## state_effect[3] 0.036325 0.2455530 0.343953 0.437209 0.612301
## state_effect[4] 0.125085 0.2494623 0.306428 0.365691 0.509042
## state_effect[5] -0.240946 -0.0884834 -0.011734 0.066425 0.229253
## state_effect[6] -0.388283 -0.2543630 -0.188830 -0.118307 0.024746
## state_effect[7] -0.272030 0.1856218 0.349146 0.494249 0.778300
## state_effect[8] 0.486356 0.6117058 0.674502 0.747143 0.896639
## state_effect[9] -0.305800 -0.1637477 -0.093692 -0.022391 0.126565
## state_effect[10] -0.331269 -0.1620071 -0.076136 0.010686 0.195463
## state_effect[11] -0.421430 -0.2884594 -0.213833 -0.133464 0.038424
## state_effect[12] 0.041580 0.2614231 0.365468 0.469157 0.666860
## state_effect[13] -0.303366 0.1051692 0.262509 0.412694 0.727928
## state_effect[14] 0.048314 0.1919572 0.259826 0.331029 0.469766
## state_effect[15] -0.197904 0.0206212 0.160181 0.328290 0.706586
## state_effect[16] -0.990768 -0.8288608 -0.745045 -0.658096 -0.477103
## state_effect[17] 0.057364 0.2997414 0.402830 0.500879 0.681215
## state_effect[18] 0.127356 0.2509802 0.313532 0.381097 0.520334
## state_effect[19] -0.466241 0.0001224 0.136114 0.251734 0.456029
## state_effect[20] -1.106953 -0.9333942 -0.845453 -0.755743 -0.555469
## state_effect[21] -0.155410 -0.0210669 0.044840 0.112510 0.249727
## state_effect[22] -0.243098 -0.0309689 0.067838 0.162286 0.346263
## state_effect[23] -0.720205 -0.4504500 -0.277333 -0.086264 0.261703
## state_effect[24] -0.108521 0.0028806 0.056421 0.117046 0.256868
## state_effect[25] 0.006843 0.2410097 0.348137 0.453427 0.680486
## state_effect[26] -0.459751 -0.3097560 -0.237196 -0.164821 -0.005908
## state_effect[27] 0.061914 0.1923149 0.254830 0.321964 0.459790
## state_effect[28] -1.095562 -0.8937048 -0.780847 -0.662100 -0.423551
## state_effect[29] -0.531814 -0.4194863 -0.361626 -0.297457 -0.163120
## state_effect[30] -0.207753 -0.0710554 -0.004107 0.066561 0.205621
## state_effect[31] -0.450677 -0.2975611 -0.216905 -0.138444 0.018726
## state_effect[32] -0.548426 -0.4124777 -0.342302 -0.265441 -0.116062
## state_effect[33] -0.571459 -0.4079511 -0.311501 -0.212643 -0.005069
## state_effect[34] 0.450711 0.6377064 0.745604 0.861280 1.087128
## state_effect[35] -0.268916 -0.0253091 0.099217 0.226786 0.474818
## state_effect[36] 0.241901 0.4180510 0.503740 0.589993 0.755895
## state_effect[37] 0.065519 0.2064774 0.263760 0.323081 0.454810
## state_effect[38] -0.888578 -0.7182833 -0.629910 -0.537495 -0.350722
## state_effect[39] -1.258966 -1.0046711 -0.867509 -0.715666 -0.290736
## state_effect[40] -0.455498 -0.3057324 -0.231141 -0.152708 -0.001289
## state_effect[41] -0.358784 -0.2427812 -0.184182 -0.121448 0.014133
## state_effect[42] -0.360201 -0.1734951 -0.077401 0.021677 0.220205
## state_effect[43] -0.493231 -0.3524673 -0.280973 -0.204983 -0.047879
## state_effect[44] -0.644217 -0.3775863 -0.244903 -0.110795 0.161436
The summary of this Bayesian model provides insight into the relative
influence of various predictors on the target variable. A few key
predictors emerge as particularly impactful. For example,
Illegitimate_Births and Large_Families show
strong positive associations with the target, meaning higher values of
these variables tend to increase the predicted outcome. This positive
effect is consistent across the samples, as indicated by relatively
narrow credible intervals that do not include zero. On the other hand,
variables like Families_2Parents and
Inc_from_inv exhibit clear negative effects, suggesting
that higher values in these predictors are associated with a decrease in
the target. The confidence in these negative relationships is
underscored by credible intervals that remain below zero, reinforcing
the idea that these variables reliably contribute to lowering the
predicted outcome.
There are, however, some predictors with more ambiguous or mixed
effects. For instance, variables such as Teen_2Par and
Median_Income have wider credible intervals that encompass
zero, indicating they may not exert a consistent or strong influence on
the target. This uncertainty suggests that, within the context of this
model, these predictors do not contribute significantly to explaining
the variation in the outcome.
Additionally, the model incorporates random effects for
State, which capture variability at the state level that
could arise from unobserved regional factors. This addition helps
control for regional differences, thereby refining the accuracy of the
fixed effects. By adjusting for state-level variability, the model can
offer a more accurate assessment of the impact of individual predictors
while accounting for unmeasured state-specific influences.
The model’s structure, particularly the use of a beta distribution
for the target with an individual dispersion parameter
(phi) for each observation, reflects an approach tailored
to data that lie between 0 and 1. This setup helps address variability
effectively across observations and enhances the model’s robustness in
capturing the nuances of the response variable. Overall, the model
reveals that certain predictors, such as
Illegitimate_Births and Families_2Parents,
play significant roles, while others appear to have a more marginal or
uncertain impact. The inclusion of both fixed and random effects makes
the model a well-rounded framework, capable of balancing
individual-level and state-level variability, thus enhancing the
reliability of its predictions and parameter estimates.
For the model check Posterior Predictive check plot and
Deviance Information Criterion (DIC) were employed.
The
Posterior Predictivemcheck plot allows to compare the observed data with
the data generated by the model, helping to assess how well the model
captures the underlying structure of the data.
The DIC, is a
statistical measure used to evaluate the predictive accuracy of a
Bayesian model, taking into account both the goodness of fit and the
complexity of the model.
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 1038
## Unobserved stochastic nodes: 2137
## Total graph size: 33465
##
## Initializing model
This posterior predictive check plot compares the density of the observed crime rate data (in dark blue) with the model’s predictions (in light blue). The model’s predictive distribution closely follows the general shape of the observed data, suggesting that the model captures the main characteristics of the data. However, it slightly overestimates the density in the mid-range (around 0.5) and underestimates it in some lower and upper parts of the distribution. These discrepancies indicate that while the model provides a reasonable fit, there may be room for refinement to better capture the tails of the distribution.
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 1038
## Unobserved stochastic nodes: 1099
## Total graph size: 30351
##
## Initializing model
## [1] "Single DIC value for the model: -0.65879045950387"
The Raftery-Lewis diagnostic is used to calculate the number of iterations required to ensure that the Markov Chain Monte Carlo (MCMC) sampler has sufficient precision and convergence. This diagnostic helps determine how many iterations are needed to estimate the quantiles of the posterior distribution with a specific accuracy and probability. So we will employ it to check if the number of samples we choose is right, that in this case is 3746.
For the evaluation of the MCMC convergence Traceplot, density plot and Rhat from the model summary were used.
Traceplot
The trace plots provide valuable insights into the convergence and mixing of the MCMC chains for the Bayesian hierarchical model. Each plot represents the sampling process for different parameters across the four chains.
The frequent crossing over of chains indicates good mixing, suggesting that the MCMC sampler is exploring the parameter space efficiently. There are no signs of divergence or significant drift, which would be evident if the chains moved in a consistent direction without crossing. Instead, the chains hover around a stable mean, indicating convergence.
Furthermore, the chains appear stationary, with fluctuations occurring around a consistent mean, suggesting that the MCMC process has likely reached a stable distribution. This visual evidence supports the expectation of a high effective sample size (ESS), implying that the estimates are reliable. While the specific metric for ESS isn’t displayed in the trace plots, the overall visual evidence strongly indicates good chain mixing and parameter stability.
Density plot
The density plots of the posterior distributions for the parameters reveal several important insights:
Firstly, the absence of multimodal behavior is evident, indicating that the MCMC chains are sampling from a single mode of the posterior distribution. This is beneficial as it suggests there are no issues related to multiple modes, which can complicate the interpretation of results.
Secondly, the overlapping density curves from different chains show strong agreement among the chains. This overlap further supports the notion of convergence, affirming that all chains are sampling from the same posterior distribution.
Lastly, the smooth and unimodal shapes of the density plots suggest that the parameter estimates are well-defined and stable. The density plots illustrate the uncertainty around the parameter estimates, with narrower peaks indicating more precise estimates.
To validate the accuracy of our Bayesian hierarchical model, we perform a comprehensive error check using various statistical metrics. By extracting posterior samples and summarizing key statistics such as mean, median, standard deviation (SD), mean absolute deviation (MAD), Monte Carlo Standard Error (MCSE), and Effective Sample Size (ESS), we can assess the convergence and precision of our parameter estimates.
## # A tibble: 61 × 10
## variable mean median sd mad mcse_mean mcse_sd rhat ess_bulk
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 beta[1] 0.129 1.32e-1 0.0728 0.0693 0.00884 5.54e-3 1.01 69.6
## 2 beta[2] -0.0217 -2.28e-2 0.0439 0.0453 0.00314 1.43e-3 1.01 196.
## 3 beta[3] -0.0129 -1.43e-2 0.0300 0.0301 0.00182 7.86e-4 1.00 273.
## 4 beta[4] 0.0115 1.14e-2 0.0291 0.0289 0.00189 9.88e-4 1.00 238.
## 5 beta[5] -0.0482 -4.79e-2 0.0439 0.0419 0.00368 1.93e-3 1.01 144.
## 6 beta[6] 0.0542 5.38e-2 0.0301 0.0300 0.00255 1.08e-3 1.00 139.
## 7 beta[7] -0.115 -1.14e-1 0.0379 0.0401 0.00283 1.09e-3 1.00 181.
## 8 beta[8] -0.0739 -7.25e-2 0.0379 0.0366 0.00245 1.35e-3 1.00 242.
## 9 beta[9] 0.183 1.83e-1 0.0257 0.0259 0.00110 4.53e-4 1.00 541.
## 10 beta[10] 0.147 1.47e-1 0.0255 0.0253 0.00134 6.05e-4 1.00 363.
## 11 beta[11] -0.0870 -8.66e-2 0.0377 0.0367 0.00257 1.25e-3 1.00 215.
## 12 beta[12] -0.463 -4.62e-1 0.0474 0.0456 0.00369 2.10e-3 1.00 166.
## 13 beta[13] -0.0719 -7.17e-2 0.0192 0.0195 0.000934 3.46e-4 1.00 424.
## 14 beta[14] -0.0112 -1.13e-2 0.0402 0.0419 0.00304 1.53e-3 1.00 176.
## 15 beta[15] 0.0117 1.16e-2 0.0331 0.0331 0.00187 9.42e-4 1.00 312.
## 16 beta[16] -0.0551 -5.59e-2 0.0376 0.0387 0.00273 1.10e-3 1.00 192.
## 17 sd_state 0.437 4.32e-1 0.0544 0.0540 0.00108 5.88e-4 1.00 2597.
## 18 state_effec… 0.859 8.71e-1 0.236 0.223 0.0116 4.61e-3 1.00 365.
## 19 state_effec… -0.115 -1.19e-1 0.124 0.123 0.00801 3.07e-3 1.00 240.
## 20 state_effec… 0.353 3.57e-1 0.145 0.144 0.00850 3.15e-3 1.00 282.
## 21 state_effec… 0.323 3.23e-1 0.0885 0.0829 0.00949 5.50e-3 1.00 87.5
## 22 state_effec… 0.00120 4.35e-4 0.119 0.119 0.00902 3.80e-3 1.00 177.
## 23 state_effec… -0.175 -1.76e-1 0.104 0.102 0.00997 5.12e-3 1.00 110.
## 24 state_effec… 0.337 3.56e-1 0.260 0.233 0.00513 3.32e-3 1.00 2164.
## 25 state_effec… 0.695 6.94e-1 0.100 0.0990 0.00921 4.68e-3 1.00 119.
## 26 state_effec… -0.0761 -7.52e-2 0.106 0.104 0.00839 3.39e-3 1.00 161.
## 27 state_effec… -0.0599 -6.06e-2 0.131 0.128 0.00746 2.60e-3 1.00 311.
## 28 state_effec… -0.197 -2.02e-1 0.117 0.114 0.00881 3.95e-3 1.00 181.
## 29 state_effec… 0.370 3.69e-1 0.161 0.151 0.00932 4.14e-3 1.00 293.
## 30 state_effec… 0.257 2.75e-1 0.248 0.221 0.00438 3.02e-3 1.00 2722.
## 31 state_effec… 0.276 2.76e-1 0.104 0.104 0.00807 2.97e-3 1.00 165.
## 32 state_effec… 0.238 2.03e-1 0.247 0.248 0.0168 7.65e-3 1.00 224.
## 33 state_effec… -0.731 -7.33e-1 0.131 0.127 0.00910 3.78e-3 1.00 208.
## 34 state_effec… 0.404 4.12e-1 0.156 0.152 0.00883 3.24e-3 1.00 300.
## 35 state_effec… 0.327 3.25e-1 0.0958 0.0916 0.00866 4.60e-3 1.00 123.
## 36 state_effec… 0.120 1.43e-1 0.210 0.187 0.0106 7.38e-3 1.00 383.
## 37 state_effec… -0.831 -8.36e-1 0.139 0.134 0.00771 3.12e-3 1.00 322.
## 38 state_effec… 0.0581 5.61e-2 0.0998 0.0984 0.00828 3.57e-3 1.00 147.
## 39 state_effec… 0.0812 8.46e-2 0.149 0.145 0.00849 2.85e-3 1.00 297.
## 40 state_effec… -0.244 -2.53e-1 0.247 0.257 0.0106 5.46e-3 1.00 548.
## 41 state_effec… 0.0745 7.20e-2 0.0937 0.0925 0.00944 5.64e-3 1.00 101.
## 42 state_effec… 0.363 3.61e-1 0.160 0.155 0.00859 2.62e-3 1.00 350.
## 43 state_effec… -0.231 -2.31e-1 0.115 0.114 0.00955 4.49e-3 1.00 148.
## 44 state_effec… 0.275 2.75e-1 0.0963 0.0950 0.00863 3.60e-3 1.00 124.
## 45 state_effec… -0.767 -7.74e-1 0.171 0.168 0.00686 1.93e-3 1.00 609.
## 46 state_effec… -0.342 -3.44e-1 0.0912 0.0892 0.00853 4.62e-3 1.00 117.
## 47 state_effec… 0.0117 1.02e-2 0.100 0.100 0.00768 3.28e-3 1.00 171.
## 48 state_effec… -0.202 -2.05e-1 0.115 0.118 0.00808 3.11e-3 1.00 206.
## 49 state_effec… -0.320 -3.23e-1 0.110 0.108 0.00901 3.79e-3 1.00 149.
## 50 state_effec… -0.293 -2.96e-1 0.144 0.140 0.00948 4.13e-3 1.00 234.
## 51 state_effec… 0.772 7.66e-1 0.159 0.160 0.00912 3.06e-3 1.00 307.
## 52 state_effec… 0.114 1.15e-1 0.189 0.184 0.00691 2.17e-3 1.00 745.
## 53 state_effec… 0.513 5.11e-1 0.129 0.130 0.00832 3.53e-3 1.00 242.
## 54 state_effec… 0.274 2.73e-1 0.0881 0.0861 0.00846 4.24e-3 1.00 109.
## 55 state_effec… -0.609 -6.11e-1 0.135 0.130 0.00945 3.78e-3 1.00 208.
## 56 state_effec… -0.840 -8.60e-1 0.242 0.216 0.00624 4.86e-3 1.00 1384.
## 57 state_effec… -0.211 -2.10e-1 0.110 0.108 0.00958 4.00e-3 1.00 132.
## 58 state_effec… -0.165 -1.68e-1 0.0907 0.0895 0.00933 4.76e-3 1.00 97.6
## 59 state_effec… -0.0722 -7.29e-2 0.145 0.143 0.00763 2.66e-3 1.00 362.
## 60 state_effec… -0.269 -2.70e-1 0.113 0.114 0.0100 3.85e-3 1.00 127.
## 61 state_effec… -0.237 -2.38e-1 0.206 0.204 0.00525 2.07e-3 1.00 1524.
## # ℹ 1 more variable: ess_tail <dbl>
The summary of the beta parameters reveals valuable insights into the
model’s findings. The intercept, with a mean of 0.13, indicates a
positive baseline effect on the response variable. Notably,
beta[6], representing the impact of individuals with a
Bachelor’s degree or higher, has a mean estimate of 0.06, suggesting a
slight positive influence on the outcome.
Conversely, the parameter for Families_2Parents
(beta[12]) exhibits a substantial negative effect, with a
mean of -0.46. This indicates that having two parents is associated with
a decrease in the response variable, highlighting the potential
challenges faced by families with this structure. Similarly,
beta[2], which corresponds to the effect of
YoungKids_2Par, has a mean of -0.03, suggesting a small
negative impact.
In terms of uncertainty, the standard deviations (SD) for most parameters are relatively low, indicating that the estimates are stable. The R-hat values around 1 further confirm that the chains have converged, bolstering confidence in these parameter estimates. Overall, these results provide a clear view of how different covariates influence the response variable in the model.
To enhance the model’s formula, both the summary of the base model and the interactions between the target variable and selected predictors highlighted in the summary analysis were used:
The plot illustrates the relationships among various socio-economic variables and their correlation with crime rates per 100,000 people. It shows that while most areas have low crime rates, a few areas experience significantly higher rates, highlighting a concentration of crime in specific regions.
Key findings include negative correlations between employment and crime rates, indicating that higher employment is associated with lower crime rates, and positive correlations between poverty and crime rates, suggesting that higher poverty levels are linked to higher crime rates. Additionally, higher rates of illegitimate births, larger families, poorer English proficiency, and greater reliance on public assistance are all positively correlated with higher crime rates. Conversely, higher median income and more two-parent families are negatively correlated with crime rates, indicating these factors contribute to lower crime rates.
Interrelationships among predictors reveal that higher employment is associated with lower poverty and higher median income, while more two-parent families are associated with higher median incomes and lower poverty levels. This overall pattern suggests that socio-economic stability, characterized by higher employment, higher income, and more two-parent families, is negatively correlated with crime rates, whereas socio-economic challenges, such as higher poverty, greater reliance on welfare, and higher rates of illegitimate births, are positively correlated with crime rates.
The resulting model, considering both the result of base_model and the plot above, will be the following:
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 1038
## Unobserved stochastic nodes: 1104
## Total graph size: 32439
##
## Initializing model
##
## Iterations = 2001:7000
## Thinning interval = 1
## Number of chains = 3
## Sample size per chain = 5000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## beta[1] 0.209620 0.06109 0.0004988 0.0072805
## beta[2] 0.047433 0.02775 0.0002265 0.0015276
## beta[3] -0.008284 0.05310 0.0004335 0.0050878
## beta[4] 0.272605 0.03521 0.0002875 0.0017186
## beta[5] 0.107958 0.02397 0.0001957 0.0012204
## beta[6] -0.090327 0.03030 0.0002474 0.0018204
## beta[7] 0.043193 0.05324 0.0004347 0.0051696
## beta[8] -0.462575 0.04359 0.0003559 0.0031458
## beta[9] 0.013042 0.03980 0.0003250 0.0025494
## beta[10] -0.015625 0.02980 0.0002433 0.0016077
## beta[11] -0.079691 0.01940 0.0001584 0.0008619
## beta[12] -0.012211 0.03913 0.0003195 0.0026789
## beta[13] 0.003139 0.02845 0.0002323 0.0017893
## beta[14] -0.134771 0.04786 0.0003908 0.0032796
## beta[15] 0.010836 0.03365 0.0002747 0.0021409
## beta[16] -0.033512 0.01591 0.0001299 0.0005187
## beta[17] 0.035497 0.01911 0.0001561 0.0009534
## beta[18] 0.033575 0.03962 0.0003235 0.0036716
## beta[19] 0.060036 0.02275 0.0001858 0.0012007
## beta[20] -0.028342 0.02729 0.0002228 0.0016258
## beta[21] 0.006972 0.02401 0.0001961 0.0012262
## sd_state 0.416239 0.05195 0.0004241 0.0008989
## state_effect[1] 0.591069 0.20919 0.0017080 0.0060195
## state_effect[2] 0.021934 0.12022 0.0009816 0.0061542
## state_effect[3] 0.273497 0.14593 0.0011915 0.0055297
## state_effect[4] 0.243956 0.07742 0.0006321 0.0059630
## state_effect[5] 0.032576 0.10827 0.0008840 0.0050936
## state_effect[6] -0.240582 0.08827 0.0007207 0.0057244
## state_effect[7] 0.291657 0.25144 0.0020530 0.0034935
## state_effect[8] 0.656035 0.09450 0.0007716 0.0059689
## state_effect[9] -0.013212 0.09778 0.0007984 0.0061072
## state_effect[10] -0.054908 0.12054 0.0009842 0.0040291
## state_effect[11] -0.266105 0.10105 0.0008251 0.0046464
## state_effect[12] 0.307451 0.15944 0.0013019 0.0054369
## state_effect[13] 0.104759 0.22798 0.0018614 0.0037395
## state_effect[14] 0.185343 0.09762 0.0007971 0.0055783
## state_effect[15] 0.525195 0.21741 0.0017751 0.0106247
## state_effect[16] -0.700877 0.11837 0.0009665 0.0042644
## state_effect[17] 0.477349 0.14481 0.0011824 0.0042359
## state_effect[18] 0.298759 0.08628 0.0007044 0.0066520
## state_effect[19] 0.078166 0.19744 0.0016121 0.0063677
## state_effect[20] -0.709775 0.13123 0.0010714 0.0053908
## state_effect[21] 0.049575 0.08793 0.0007179 0.0054536
## state_effect[22] 0.012863 0.14152 0.0011555 0.0045046
## state_effect[23] -0.305494 0.25076 0.0020474 0.0108752
## state_effect[24] 0.042375 0.08145 0.0006650 0.0062752
## state_effect[25] 0.375058 0.15874 0.0012961 0.0043282
## state_effect[26] -0.328019 0.11189 0.0009136 0.0054628
## state_effect[27] 0.197508 0.08571 0.0006998 0.0050606
## state_effect[28] -0.745138 0.16970 0.0013856 0.0045362
## state_effect[29] -0.343172 0.08256 0.0006741 0.0047497
## state_effect[30] 0.001327 0.09111 0.0007439 0.0046746
## state_effect[31] -0.240494 0.11721 0.0009570 0.0055833
## state_effect[32] -0.281232 0.09379 0.0007658 0.0044296
## state_effect[33] -0.341424 0.12204 0.0009965 0.0046056
## state_effect[34] 0.762092 0.14764 0.0012055 0.0052279
## state_effect[35] 0.087440 0.18322 0.0014960 0.0041189
## state_effect[36] 0.433655 0.10840 0.0008851 0.0054545
## state_effect[37] 0.335453 0.07679 0.0006270 0.0053216
## state_effect[38] -0.542369 0.14071 0.0011489 0.0050703
## state_effect[39] -0.860840 0.24750 0.0020208 0.0050913
## state_effect[40] -0.266361 0.09581 0.0007823 0.0052395
## state_effect[41] -0.176246 0.07956 0.0006496 0.0054072
## state_effect[42] -0.079329 0.14172 0.0011571 0.0046672
## state_effect[43] -0.339101 0.09903 0.0008086 0.0055529
## state_effect[44] -0.208120 0.21061 0.0017196 0.0045006
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## beta[1] 0.094290 0.168924 0.2077862 0.246529 0.341043
## beta[2] -0.008417 0.029263 0.0476151 0.066590 0.100210
## beta[3] -0.114428 -0.042090 -0.0091002 0.025889 0.099693
## beta[4] 0.208253 0.247781 0.2707517 0.295936 0.345751
## beta[5] 0.059575 0.091750 0.1083062 0.124773 0.153406
## beta[6] -0.148273 -0.110868 -0.0906906 -0.070476 -0.027960
## beta[7] -0.053522 0.007351 0.0405672 0.076925 0.160806
## beta[8] -0.546555 -0.492350 -0.4636838 -0.433574 -0.376761
## beta[9] -0.068441 -0.013372 0.0142846 0.039659 0.088964
## beta[10] -0.076028 -0.035371 -0.0149868 0.004567 0.041509
## beta[11] -0.118687 -0.092354 -0.0795389 -0.067031 -0.041101
## beta[12] -0.089122 -0.037443 -0.0129407 0.013846 0.066123
## beta[13] -0.054329 -0.015009 0.0040276 0.022286 0.056813
## beta[14] -0.229188 -0.165709 -0.1369720 -0.103165 -0.038693
## beta[15] -0.053122 -0.012473 0.0105112 0.033525 0.076367
## beta[16] -0.064326 -0.044213 -0.0337744 -0.022581 -0.002149
## beta[17] -0.001574 0.022469 0.0354048 0.048476 0.073279
## beta[18] -0.046083 0.007447 0.0340188 0.061015 0.110230
## beta[19] 0.015240 0.044674 0.0601331 0.075445 0.104143
## beta[20] -0.079412 -0.047581 -0.0288189 -0.009392 0.025095
## beta[21] -0.039167 -0.009680 0.0068042 0.023466 0.053984
## sd_state 0.325783 0.380103 0.4121564 0.447748 0.529228
## state_effect[1] 0.140039 0.463911 0.6024153 0.731258 0.970883
## state_effect[2] -0.212962 -0.058717 0.0215281 0.102975 0.263266
## state_effect[3] -0.033793 0.181226 0.2815276 0.372825 0.538898
## state_effect[4] 0.081876 0.194661 0.2453495 0.296987 0.388118
## state_effect[5] -0.186299 -0.038539 0.0337910 0.105910 0.240674
## state_effect[6] -0.423950 -0.297665 -0.2374293 -0.179791 -0.076910
## state_effect[7] -0.279388 0.153536 0.3105729 0.452840 0.749837
## state_effect[8] 0.466262 0.592788 0.6576124 0.721779 0.837943
## state_effect[9] -0.206580 -0.078797 -0.0123237 0.053526 0.173849
## state_effect[10] -0.293309 -0.135614 -0.0550454 0.025221 0.180264
## state_effect[11] -0.460558 -0.332221 -0.2679423 -0.199730 -0.061608
## state_effect[12] -0.032536 0.212282 0.3117600 0.408114 0.620682
## state_effect[13] -0.378404 -0.030641 0.1104586 0.248281 0.554836
## state_effect[14] -0.002470 0.118399 0.1861270 0.250219 0.378585
## state_effect[15] 0.102864 0.373586 0.5287011 0.684329 0.925016
## state_effect[16] -0.927556 -0.782450 -0.7027491 -0.621587 -0.464138
## state_effect[17] 0.185100 0.383255 0.4805084 0.574169 0.759543
## state_effect[18] 0.122252 0.243514 0.2995792 0.355927 0.461518
## state_effect[19] -0.384773 -0.033229 0.1012062 0.213209 0.410109
## state_effect[20] -0.966890 -0.796326 -0.7103690 -0.625128 -0.449275
## state_effect[21] -0.125391 -0.008693 0.0498627 0.109576 0.219169
## state_effect[22] -0.270107 -0.081710 0.0143465 0.108887 0.285892
## state_effect[23] -0.712093 -0.483505 -0.3357972 -0.161805 0.281917
## state_effect[24] -0.122431 -0.010020 0.0452275 0.098701 0.194256
## state_effect[25] 0.045669 0.275461 0.3784400 0.481301 0.674948
## state_effect[26] -0.559511 -0.401981 -0.3235646 -0.251620 -0.118042
## state_effect[27] 0.028502 0.138954 0.1990718 0.256655 0.363620
## state_effect[28] -1.068415 -0.859428 -0.7499560 -0.634214 -0.396430
## state_effect[29] -0.505022 -0.400048 -0.3421863 -0.287682 -0.184903
## state_effect[30] -0.176373 -0.061206 0.0006642 0.063231 0.181708
## state_effect[31] -0.469174 -0.320952 -0.2414569 -0.161635 -0.010286
## state_effect[32] -0.466942 -0.344160 -0.2803570 -0.218573 -0.096746
## state_effect[33] -0.575753 -0.424658 -0.3426209 -0.260408 -0.096337
## state_effect[34] 0.480553 0.659821 0.7591349 0.859875 1.061653
## state_effect[35] -0.267693 -0.034883 0.0833642 0.207038 0.459559
## state_effect[36] 0.221629 0.360425 0.4328309 0.508260 0.644959
## state_effect[37] 0.186292 0.281929 0.3358051 0.389230 0.486449
## state_effect[38] -0.801391 -0.638788 -0.5500597 -0.451486 -0.250519
## state_effect[39] -1.281555 -1.026035 -0.8831396 -0.726323 -0.282128
## state_effect[40] -0.460677 -0.327979 -0.2641889 -0.202812 -0.083582
## state_effect[41] -0.337229 -0.228586 -0.1749380 -0.120616 -0.026932
## state_effect[42] -0.358010 -0.174681 -0.0761281 0.015588 0.198784
## state_effect[43] -0.544499 -0.403571 -0.3360603 -0.273009 -0.151517
## state_effect[44] -0.607261 -0.350902 -0.2099416 -0.068081 0.211197
The summary of the second model highlights the influence of various
predictors on the target variable. The intercept (beta[1]) has a
positive mean of approximately 0.186, suggesting a baseline effect when
predictors are at their reference levels. Key predictors like
Illegitimate_Births (beta[4]) show a significant positive
association, with a mean of around 0.273, indicating that higher values
lead to increased predicted outcomes.
In contrast, predictors such as Inc_from_inv(beta[6])
and Families_2Parents(beta[8]) have negative mean estimates
(-0.087 and -0.462), indicating that higher values in these variables
are associated with lower predicted outcomes. These negative
relationships are supported by credible intervals that remain below
zero.
Some predictors, like Below_Poverty(beta[3]), exhibit
ambiguous effects, with a mean near zero and credible intervals crossing
zero, suggesting no strong influence on the target variable. The
inclusion of interaction terms, such as
Illegitimate_Births * Below_poverty(beta[14]), indicates
that the combined effects of variables can modulate overall impacts.
Overall, the summary illustrates how specific socio-economic factors influence the target variable, with strong positive correlations for some predictors and weaker or negative relationships for others, reflecting the complexity of these interactions.
For the diagnostic of the model we compared the Posterior Predictive check plots and the DIC scores:
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 1038
## Unobserved stochastic nodes: 2142
## Total graph size: 35553
##
## Initializing model
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 1038
## Unobserved stochastic nodes: 1104
## Total graph size: 32439
##
## Initializing model
## [1] "Single DIC value for the second model: -0.67437109639709"
The Posterior Predictive Check of this second model demonstrates an enhanced fit. The predicted density curves appear to align more closely with the observed data, particularly in the mid to higher ranges of crime rates. This suggests that the inclusion of interaction terms and the exclusion of some covariates might have allowed the model to better account for the underlying relationships in the data.
If compared to the PPC of the first model, the improvement in the second model is especially noticeable as the density curves converge more effectively, indicating that the model has become more adept at predicting crime rates across different socio-economic contexts. Overall, it appears that the modifications introduced in the second model have led to a more robust predictive capability, thereby enhancing the model’s accuracy in estimating crime rates.
The DIC value further support this observation.
In summary, the posterior predictive checks and the DIC values collectively indicate that the second model outperforms the base model in predicting crime rates. The second model’s improved alignment with the observed data and its lower DIC value highlight its superiority in capturing the underlying patterns within the data.
Traceplot
Density plot
The conclusions we can draw from these traceplots and density plot are the same as the base model:
The trace plots indicate that the MCMC chains for the Bayesian hierarchical model have converged well, with good mixing and stationarity. The parameter estimates appear to be reliable, with high effective sample sizes and R-hat values close to 1, indicating robust and trustworthy inferences. Both Bulk_ESS and Tail_ESS values are sufficiently large, confirming that the chains have thoroughly explored the parameter space.
The density plots demonstrate unimodal and smooth distributions for the parameters, suggesting well-defined and stable parameter estimates. The density curves for different chains overlap significantly, indicating strong agreement among the chains and further supporting the convergence of the model. The absence of multimodal behavior in the density plots suggests that the MCMC chains are sampling from a single mode of the posterior distribution, avoiding issues related to multiple modes.
Then after the convergence diagnostics we check the accuracy of our Bayesian hierarchical model with the following statistics :
## # A tibble: 66 × 10
## variable mean median sd mad mcse_mean mcse_sd rhat ess_bulk
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 beta[1] 0.176 0.175 0.0746 0.0735 0.0174 6.03e-3 1.10 17.6
## 2 beta[2] 0.0514 0.0519 0.0298 0.0287 0.00236 1.03e-3 1.00 160.
## 3 beta[3] -0.00837 -0.00675 0.0539 0.0543 0.00605 2.80e-3 1.00 79.5
## 4 beta[4] 0.269 0.266 0.0345 0.0348 0.00173 8.95e-4 1.00 402.
## 5 beta[5] 0.108 0.108 0.0245 0.0240 0.00152 6.60e-4 1.00 262.
## 6 beta[6] -0.0870 -0.0866 0.0307 0.0313 0.00225 8.83e-4 1.00 187.
## 7 beta[7] 0.0410 0.0407 0.0605 0.0627 0.00724 4.14e-3 1.01 70.6
## 8 beta[8] -0.470 -0.470 0.0470 0.0455 0.00365 2.02e-3 1.00 167.
## 9 beta[9] 0.0171 0.0153 0.0422 0.0417 0.00379 1.55e-3 1.03 123.
## 10 beta[10] -0.0127 -0.0130 0.0306 0.0307 0.00231 1.24e-3 1.00 177.
## 11 beta[11] -0.0815 -0.0814 0.0190 0.0187 0.00106 4.66e-4 1.01 327.
## 12 beta[12] -0.00900 -0.00901 0.0407 0.0437 0.00297 1.75e-3 1.01 187.
## 13 beta[13] 0.00326 0.00408 0.0302 0.0304 0.00231 1.04e-3 1.02 172.
## 14 beta[14] -0.128 -0.127 0.0456 0.0450 0.00300 1.61e-3 1.00 234.
## 15 beta[15] 0.00712 0.00652 0.0327 0.0320 0.00217 1.04e-3 1.00 232.
## 16 beta[16] -0.0341 -0.0339 0.0153 0.0154 0.000503 2.29e-4 1.00 929.
## 17 beta[17] 0.0352 0.0351 0.0197 0.0194 0.00137 7.79e-4 1.01 208.
## 18 beta[18] 0.0329 0.0310 0.0461 0.0464 0.00554 2.85e-3 1.01 69.6
## 19 beta[19] 0.0618 0.0610 0.0225 0.0233 0.00136 5.42e-4 1.02 280.
## 20 beta[20] -0.0352 -0.0353 0.0273 0.0276 0.00162 9.00e-4 1.00 283.
## 21 beta[21] 0.0127 0.0130 0.0235 0.0239 0.00127 6.33e-4 1.00 344.
## 22 sd_state 0.419 0.415 0.0524 0.0508 0.00113 6.23e-4 1.00 2092.
## 23 state_effec… 0.611 0.619 0.209 0.200 0.0118 2.83e-3 1.02 304.
## 24 state_effec… 0.0488 0.0506 0.121 0.121 0.0114 3.01e-3 1.02 113.
## 25 state_effec… 0.322 0.330 0.148 0.144 0.00901 3.86e-3 1.01 271.
## 26 state_effec… 0.274 0.273 0.0873 0.0822 0.0197 6.08e-3 1.09 18.1
## 27 state_effec… 0.0698 0.0707 0.113 0.115 0.0107 3.12e-3 1.04 112.
## 28 state_effec… -0.209 -0.209 0.0955 0.0975 0.0145 3.76e-3 1.06 43.7
## 29 state_effec… 0.316 0.337 0.253 0.226 0.00565 3.56e-3 1.00 1750.
## 30 state_effec… 0.689 0.689 0.101 0.103 0.0137 3.96e-3 1.05 53.9
## 31 state_effec… 0.0152 0.0152 0.104 0.105 0.0112 3.40e-3 1.02 86.0
## 32 state_effec… -0.0236 -0.0232 0.124 0.122 0.00829 2.10e-3 1.02 221.
## 33 state_effec… -0.234 -0.234 0.108 0.111 0.0127 2.33e-3 1.03 72.7
## 34 state_effec… 0.334 0.337 0.155 0.146 0.00869 3.69e-3 1.01 317.
## 35 state_effec… 0.135 0.141 0.241 0.218 0.00525 3.22e-3 1.00 1933.
## 36 state_effec… 0.219 0.219 0.106 0.106 0.0132 3.05e-3 1.03 64.5
## 37 state_effec… 0.565 0.566 0.211 0.222 0.0106 5.00e-3 1.00 398.
## 38 state_effec… -0.675 -0.677 0.125 0.123 0.0128 3.17e-3 1.03 94.6
## 39 state_effec… 0.505 0.509 0.154 0.149 0.00978 2.67e-3 1.02 237.
## 40 state_effec… 0.332 0.331 0.0943 0.0918 0.0268 4.79e-3 1.09 12.2
## 41 state_effec… 0.0972 0.120 0.205 0.189 0.00760 4.69e-3 1.00 699.
## 42 state_effec… -0.676 -0.678 0.135 0.135 0.00816 2.34e-3 1.01 275.
## 43 state_effec… 0.0794 0.0811 0.0953 0.0962 0.0108 3.02e-3 1.04 78.3
## 44 state_effec… 0.0472 0.0485 0.149 0.148 0.0114 2.17e-3 1.02 170.
## 45 state_effec… -0.296 -0.320 0.227 0.215 0.0147 6.85e-3 1.01 233.
## 46 state_effec… 0.0707 0.0710 0.0886 0.0866 0.0148 5.07e-3 1.07 35.3
## 47 state_effec… 0.410 0.413 0.165 0.160 0.00684 2.88e-3 1.01 572.
## 48 state_effec… -0.305 -0.306 0.117 0.117 0.0154 3.66e-3 1.05 57.6
## 49 state_effec… 0.232 0.232 0.0918 0.0928 0.0115 3.26e-3 1.03 63.3
## 50 state_effec… -0.719 -0.720 0.171 0.170 0.00714 1.94e-3 1.01 572.
## 51 state_effec… -0.312 -0.309 0.0891 0.0897 0.0128 3.86e-3 1.06 48.6
## 52 state_effec… 0.0381 0.0371 0.101 0.102 0.0138 3.36e-3 1.05 54.2
## 53 state_effec… -0.207 -0.209 0.123 0.124 0.00868 2.65e-3 1.01 197.
## 54 state_effec… -0.249 -0.249 0.0983 0.0976 0.0103 2.96e-3 1.03 91.8
## 55 state_effec… -0.310 -0.313 0.128 0.124 0.0114 3.38e-3 1.03 125.
## 56 state_effec… 0.795 0.791 0.157 0.154 0.0115 4.00e-3 1.02 183.
## 57 state_effec… 0.111 0.111 0.183 0.179 0.00731 2.02e-3 1.01 625.
## 58 state_effec… 0.471 0.471 0.114 0.113 0.0112 3.43e-3 1.02 104.
## 59 state_effec… 0.372 0.374 0.0850 0.0849 0.0103 3.66e-3 1.03 69.2
## 60 state_effec… -0.503 -0.510 0.144 0.138 0.00780 3.05e-3 1.02 329.
## 61 state_effec… -0.839 -0.863 0.254 0.226 0.00688 4.07e-3 1.00 1192.
## 62 state_effec… -0.233 -0.232 0.102 0.103 0.0135 2.69e-3 1.05 56.6
## 63 state_effec… -0.152 -0.150 0.0893 0.0883 0.0177 3.58e-3 1.07 26.2
## 64 state_effec… -0.0371 -0.0344 0.153 0.153 0.0103 2.88e-3 1.02 215.
## 65 state_effec… -0.313 -0.314 0.104 0.101 0.0115 3.85e-3 1.04 81.3
## 66 state_effec… -0.167 -0.167 0.214 0.215 0.00630 2.67e-3 1.00 1141.
## # ℹ 1 more variable: ess_tail <dbl>
The error check for the Bayesian model parameters offers insights
into the effects of various predictors on the target variable. The mean
coefficients reveal significant associations:
Illegitimate_Births has a mean of
approximately 0.268, indicating a strong positive effect, while
Large_Families shows a mean of 0.107,
suggesting that higher values correlate with increased predicted
outcomes. In contrast, Families_2Parents
has a mean around -0.470, highlighting a negative association with the
target.
Standard deviations (sd) provide clarity on precision; for instance,
the low sd of approximately 0.033 for
Illegitimate_Births suggests reliability,
whereas Families_2Parents has a higher sd
of 0.071, indicating more uncertainty. Additionally, the MCSE values
remain small, with Employed having a mean
of approximately -0.086 and an sd of 0.028, reflecting a negative
relationship and precise estimation.
R-hat values close to 1 (around 1.001) suggest good convergence of the Markov Chain Monte Carlo (MCMC) chains, confirming the stability of posterior distributions. The consistent signs and narrow credible intervals indicate a strong model fit, demonstrating that the model effectively captures the relationships among the predictors.
In summary, the analysis points to a well-performing model with reliable estimates and a solid understanding of the underlying relationships within the data, critical for making informed inferences based on the model’s output.
Given the considerations from the summary of the models, and the representations of the variables vs target, we went with the following new model:
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 1038
## Unobserved stochastic nodes: 1098
## Total graph size: 27790
##
## Initializing model
##
## Iterations = 2001:7000
## Thinning interval = 1
## Number of chains = 3
## Sample size per chain = 5000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## beta[1] 0.28187 0.08927 0.0007289 0.0132747
## beta[2] -0.22555 0.13023 0.0010633 0.0171854
## beta[3] 0.23231 0.02929 0.0002392 0.0012998
## beta[4] 0.22472 0.03192 0.0002606 0.0018564
## beta[5] -0.06465 0.02431 0.0001985 0.0011066
## beta[6] -0.20790 0.13220 0.0010794 0.0172683
## beta[7] -0.44966 0.03980 0.0003250 0.0026165
## beta[8] -0.10275 0.04303 0.0003513 0.0032104
## beta[9] -0.07053 0.01667 0.0001361 0.0006087
## beta[10] -0.12201 0.04605 0.0003760 0.0034841
## beta[11] -0.05658 0.01913 0.0001562 0.0007896
## beta[12] 0.28800 0.20190 0.0016485 0.0335528
## beta[13] -0.18015 0.04613 0.0003767 0.0025671
## beta[14] 0.10203 0.06548 0.0005346 0.0056448
## beta[15] 0.14998 0.05603 0.0004575 0.0039953
## sd_state 0.41303 0.05181 0.0004231 0.0009151
## state_effect[1] 0.74765 0.20102 0.0016413 0.0047948
## state_effect[2] -0.01331 0.11285 0.0009214 0.0047239
## state_effect[3] 0.31206 0.14644 0.0011956 0.0057088
## state_effect[4] 0.30211 0.07837 0.0006399 0.0069964
## state_effect[5] 0.01531 0.10777 0.0008799 0.0054850
## state_effect[6] -0.21775 0.09071 0.0007407 0.0057492
## state_effect[7] 0.27870 0.25181 0.0020560 0.0036403
## state_effect[8] 0.62723 0.08706 0.0007108 0.0053871
## state_effect[9] 0.01063 0.09357 0.0007640 0.0051650
## state_effect[10] -0.09384 0.12473 0.0010184 0.0064186
## state_effect[11] -0.27496 0.10163 0.0008298 0.0050804
## state_effect[12] 0.30852 0.15428 0.0012597 0.0055022
## state_effect[13] 0.12020 0.23552 0.0019230 0.0040429
## state_effect[14] 0.26743 0.09496 0.0007754 0.0050394
## state_effect[15] 0.30258 0.23850 0.0019473 0.0129082
## state_effect[16] -0.64155 0.11729 0.0009576 0.0052317
## state_effect[17] 0.37939 0.13447 0.0010980 0.0042879
## state_effect[18] 0.36884 0.08353 0.0006820 0.0052974
## state_effect[19] 0.03969 0.20924 0.0017084 0.0071597
## state_effect[20] -0.69672 0.13255 0.0010823 0.0065182
## state_effect[21] 0.04379 0.09595 0.0007834 0.0057084
## state_effect[22] -0.03305 0.14072 0.0011490 0.0049856
## state_effect[23] -0.16409 0.26232 0.0021418 0.0116465
## state_effect[24] 0.07407 0.07843 0.0006404 0.0061842
## state_effect[25] 0.37750 0.15960 0.0013031 0.0048896
## state_effect[26] -0.25804 0.11482 0.0009375 0.0059244
## state_effect[27] 0.27787 0.08802 0.0007186 0.0049455
## state_effect[28] -0.72859 0.17631 0.0014396 0.0051532
## state_effect[29] -0.35103 0.08074 0.0006592 0.0052844
## state_effect[30] -0.01764 0.09091 0.0007423 0.0058363
## state_effect[31] -0.23253 0.11624 0.0009491 0.0046149
## state_effect[32] -0.30528 0.09544 0.0007793 0.0059025
## state_effect[33] -0.33222 0.12352 0.0010086 0.0054964
## state_effect[34] 0.84080 0.15463 0.0012625 0.0058129
## state_effect[35] 0.09758 0.17840 0.0014567 0.0042894
## state_effect[36] 0.51005 0.12279 0.0010026 0.0050609
## state_effect[37] 0.30782 0.07882 0.0006436 0.0062091
## state_effect[38] -0.46258 0.13901 0.0011350 0.0050207
## state_effect[39] -0.80252 0.23489 0.0019178 0.0047710
## state_effect[40] -0.21708 0.09909 0.0008091 0.0055879
## state_effect[41] -0.17417 0.08192 0.0006689 0.0059560
## state_effect[42] -0.11901 0.13018 0.0010629 0.0047543
## state_effect[43] -0.36309 0.09892 0.0008077 0.0051453
## state_effect[44] -0.23007 0.20207 0.0016499 0.0041161
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## beta[1] 0.1194437 0.21884 0.27768 0.34131 0.46807
## beta[2] -0.4830350 -0.31064 -0.22141 -0.13626 0.02214
## beta[3] 0.1795079 0.21169 0.23053 0.25154 0.29230
## beta[4] 0.1651008 0.20192 0.22424 0.24672 0.28770
## beta[5] -0.1137663 -0.08078 -0.06411 -0.04799 -0.01808
## beta[6] -0.4719951 -0.29847 -0.20338 -0.11812 0.04729
## beta[7] -0.5311962 -0.47549 -0.44930 -0.42414 -0.37042
## beta[8] -0.1858968 -0.13231 -0.10162 -0.07452 -0.01856
## beta[9] -0.1037504 -0.08187 -0.07046 -0.05908 -0.03841
## beta[10] -0.2097388 -0.15385 -0.12257 -0.09044 -0.02976
## beta[11] -0.0951394 -0.06909 -0.05589 -0.04316 -0.02147
## beta[12] -0.0887202 0.14837 0.28237 0.42373 0.71190
## beta[13] -0.2716004 -0.21145 -0.17975 -0.14844 -0.09186
## beta[14] -0.0300275 0.06228 0.10207 0.14557 0.22724
## beta[15] 0.0368579 0.11365 0.15003 0.18828 0.25744
## sd_state 0.3242235 0.37695 0.40862 0.44412 0.53049
## state_effect[1] 0.2958014 0.63634 0.76452 0.87995 1.09728
## state_effect[2] -0.2331920 -0.08986 -0.01476 0.06144 0.21140
## state_effect[3] 0.0148212 0.21550 0.31512 0.41387 0.58775
## state_effect[4] 0.1415645 0.25211 0.30326 0.35172 0.45486
## state_effect[5] -0.2015216 -0.05468 0.01588 0.08851 0.22264
## state_effect[6] -0.4015678 -0.27803 -0.21576 -0.15583 -0.04234
## state_effect[7] -0.2827546 0.13583 0.29944 0.44166 0.72691
## state_effect[8] 0.4507743 0.57012 0.62992 0.68711 0.79358
## state_effect[9] -0.1741064 -0.05218 0.01166 0.07298 0.19203
## state_effect[10] -0.3373412 -0.17722 -0.09312 -0.01112 0.15130
## state_effect[11] -0.4747391 -0.34320 -0.27540 -0.20675 -0.07391
## state_effect[12] -0.0006386 0.21046 0.30835 0.40418 0.62093
## state_effect[13] -0.3707596 -0.01913 0.12652 0.26936 0.55805
## state_effect[14] 0.0782968 0.20385 0.26835 0.33192 0.45280
## state_effect[15] -0.0927387 0.12243 0.26821 0.47182 0.79964
## state_effect[16] -0.8662893 -0.71977 -0.64347 -0.56604 -0.40087
## state_effect[17] 0.1173140 0.28881 0.38088 0.47036 0.63979
## state_effect[18] 0.1960442 0.31488 0.37145 0.42439 0.52902
## state_effect[19] -0.4547463 -0.07409 0.06740 0.18287 0.38178
## state_effect[20] -0.9488976 -0.78706 -0.70105 -0.60896 -0.42720
## state_effect[21] -0.1460774 -0.02055 0.04447 0.10968 0.23003
## state_effect[22] -0.3086523 -0.12804 -0.03325 0.06012 0.24293
## state_effect[23] -0.6653640 -0.36184 -0.15377 0.03554 0.29833
## state_effect[24] -0.0871747 0.02281 0.07731 0.12751 0.22286
## state_effect[25] 0.0502745 0.27564 0.37970 0.48401 0.68466
## state_effect[26] -0.4938131 -0.33300 -0.25267 -0.18036 -0.04126
## state_effect[27] 0.1019895 0.21905 0.27961 0.33840 0.44321
## state_effect[28] -1.0734504 -0.84704 -0.73025 -0.61187 -0.37911
## state_effect[29] -0.5151779 -0.40334 -0.34937 -0.29644 -0.19506
## state_effect[30] -0.2002748 -0.07947 -0.01582 0.04482 0.15397
## state_effect[31] -0.4703172 -0.31048 -0.22714 -0.14953 -0.02131
## state_effect[32] -0.4959505 -0.36775 -0.30369 -0.24206 -0.12032
## state_effect[33] -0.5625173 -0.41700 -0.33717 -0.25233 -0.07682
## state_effect[34] 0.5475383 0.73477 0.83602 0.94257 1.15842
## state_effect[35] -0.2551529 -0.01807 0.09729 0.21406 0.45461
## state_effect[36] 0.2625563 0.42917 0.51158 0.59408 0.74834
## state_effect[37] 0.1467823 0.25594 0.31067 0.36171 0.45690
## state_effect[38] -0.7291456 -0.55352 -0.46636 -0.37321 -0.17492
## state_effect[39] -1.2066154 -0.96047 -0.82292 -0.66827 -0.26643
## state_effect[40] -0.4156266 -0.28225 -0.21576 -0.14994 -0.02420
## state_effect[41] -0.3374034 -0.22887 -0.17249 -0.11942 -0.01459
## state_effect[42] -0.3772676 -0.20586 -0.11681 -0.03112 0.13019
## state_effect[43] -0.5651941 -0.42700 -0.36073 -0.29639 -0.17427
## state_effect[44] -0.6196202 -0.36784 -0.22859 -0.09726 0.16785
The summary of the Bayesian model parameters provides an insightful analysis of the predictors’ effects on the response variable.
The empirical means of the beta coefficients reveal significant
relationships. For instance, the coefficient for
log(1 + abs(Below_Poverty)) has a mean of approximately
-0.2205, indicating that as the logged value of below-poverty increases,
the outcome variable tends to decrease, suggesting that higher poverty
levels are associated with lower predicted values. In contrast, the
coefficient for Illegitimate_Births has a mean around
0.2350, signifying a positive effect, meaning that higher rates of
illegitimate births are linked to higher values of the outcome.
Other variables also present interesting dynamics. The coefficient
for Median_Income[i] shows a mean of 0.2231, suggesting
that as median income increases, the target outcome also tends to
increase, indicating a positive correlation. Conversely, the coefficient
for Families_2Parents[i] exhibits a mean around -0.4447,
implying that an increase in two-parent families may lead to a decrease
in the target variable, highlighting the complexity of social dynamics
at play.
The standard deviations (SD) of these coefficients provide insight
into the reliability of the estimates. For example,
Large_Families has a low SD of approximately 0.03279,
indicating a precise estimate, whereas Illegitimate_Births
shows a higher SD of around 0.04608, suggesting greater variability and
uncertainty in its influence.
The quantiles further illustrate the range and uncertainty
surrounding these estimates. For instance, the 95% credible interval for
Illegitimate_Births ranges from -0.4915 to -0.1341,
indicating strong confidence in its negative association with the target
variable. Meanwhile, log(1 + abs(Below_Poverty)) has a
broader range from -0.4031 to -0.0119, suggesting a more uncertain
relationship.
In conclusion, this Bayesian model’s output highlights important
predictors like Below_Poverty,
Illegitimate_Births, and Median_Income, along
with their effects on the target variable. The reliable estimates,
variability insights, and credible intervals help inform understanding
of these relationships, making this model a valuable tool for analysis
and inference.
For the diagnostic of the model we compared the Posterior Predictive check plots and the DIC scores:
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 1038
## Unobserved stochastic nodes: 2136
## Total graph size: 30904
##
## Initializing model
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 1038
## Unobserved stochastic nodes: 1098
## Total graph size: 27790
##
## Initializing model
## [1] "Single DIC value for the improved 2nd model: -0.697293925974766"
The analysis of the posterior predictive checks and DIC scores demonstrates a clear improvement in model performance from the base model to the final model. The final model’s predictions align more closely with the observed data, and its WAIC score is the lowest, indicating it provides the best fit while appropriately managing model complexity. This suggests that the refinements made in the final model, such as including interaction terms and potentially non-linear relationships, have significantly enhanced its predictive accuracy for crime rates.
Traceplot
Density plot
The diagnostic plots for the final model provide valuable insights into the convergence and reliability of parameter estimates. The trace plots generally indicate good mixing and stationarity, with chains fluctuating around a stable mean and showing no evident trends. However, some betas display less reliable convergence, suggesting instability in their estimates. The 95% credible intervals remain relatively narrow and consistent across most chains, reinforcing the assessment of convergence for the majority of parameters.
The density plots support these findings, revealing that the posterior distributions have generally converged well, as evidenced by overlapping density curves for each chain. While most parameters show smooth, unimodal distributions, some betas exhibit less definitive shapes, indicating that further examination may be necessary.
Overall, the diagnostic plots indicate a strong performance for most parameters, but careful attention should be directed toward those with less reliable convergence, as they could impact the robustness of the model’s conclusions.
Then after the convergence diagnostics we check the accuracy of our Bayesian hierarchical model with the following statistics :
## # A tibble: 60 × 10
## variable mean median sd mad mcse_mean mcse_sd rhat ess_bulk
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 beta[1] 0.263 0.267 0.0831 0.0815 0.0170 6.22e-3 1.03 23.5
## 2 beta[2] -0.202 -0.194 0.108 0.108 0.0193 6.48e-3 1.05 31.0
## 3 beta[3] 0.228 0.226 0.0280 0.0270 0.00124 8.81e-4 1.00 540.
## 4 beta[4] 0.222 0.222 0.0336 0.0324 0.00213 1.09e-3 1.00 252.
## 5 beta[5] -0.0625 -0.0624 0.0237 0.0237 0.00114 5.15e-4 1.00 434.
## 6 beta[6] -0.188 -0.189 0.122 0.123 0.0257 9.37e-3 1.07 23.1
## 7 beta[7] -0.447 -0.447 0.0393 0.0386 0.00257 1.23e-3 1.00 233.
## 8 beta[8] -0.104 -0.104 0.0444 0.0442 0.00365 1.81e-3 1.01 148.
## 9 beta[9] -0.0704 -0.0703 0.0166 0.0168 0.000718 3.05e-4 1.00 536.
## 10 beta[10] -0.111 -0.110 0.0429 0.0436 0.00441 1.69e-3 1.01 94.6
## 11 beta[11] -0.0550 -0.0541 0.0185 0.0183 0.000780 4.52e-4 1.00 570.
## 12 beta[12] 0.250 0.251 0.177 0.187 0.0371 1.34e-2 1.07 23.4
## 13 beta[13] -0.176 -0.176 0.0464 0.0461 0.00293 1.53e-3 1.01 251.
## 14 beta[14] 0.0891 0.0889 0.0638 0.0685 0.00709 2.66e-3 1.00 82.3
## 15 beta[15] 0.146 0.147 0.0562 0.0554 0.00489 2.37e-3 1.00 132.
## 16 sd_state 0.416 0.411 0.0519 0.0507 0.00113 5.31e-4 1.00 2134.
## 17 state_effec… 0.757 0.772 0.202 0.183 0.00879 3.24e-3 1.00 463.
## 18 state_effec… -0.00424 -0.00825 0.122 0.121 0.0106 2.86e-3 1.00 132.
## 19 state_effec… 0.313 0.317 0.152 0.152 0.00805 2.50e-3 1.00 359.
## 20 state_effec… 0.311 0.307 0.0809 0.0807 0.0101 4.09e-3 1.00 65.1
## 21 state_effec… 0.0240 0.0252 0.114 0.112 0.00949 2.88e-3 1.00 143.
## 22 state_effec… -0.213 -0.215 0.0943 0.0919 0.00908 3.53e-3 1.00 108.
## 23 state_effec… 0.294 0.313 0.251 0.223 0.00476 3.27e-3 1.00 1990.
## 24 state_effec… 0.637 0.633 0.0947 0.0955 0.0100 3.36e-3 1.00 90.1
## 25 state_effec… 0.0190 0.0155 0.0971 0.0957 0.00876 3.09e-3 1.00 124.
## 26 state_effec… -0.0873 -0.0879 0.126 0.126 0.00893 2.30e-3 1.00 198.
## 27 state_effec… -0.264 -0.266 0.105 0.105 0.0106 2.16e-3 1.00 98.5
## 28 state_effec… 0.309 0.309 0.156 0.143 0.00770 4.82e-3 1.00 400.
## 29 state_effec… 0.140 0.146 0.233 0.215 0.00554 2.61e-3 1.00 1695.
## 30 state_effec… 0.276 0.273 0.0988 0.101 0.00923 2.63e-3 1.00 115.
## 31 state_effec… 0.345 0.305 0.270 0.296 0.0221 7.14e-3 1.00 152.
## 32 state_effec… -0.640 -0.645 0.121 0.118 0.00868 2.50e-3 1.00 195.
## 33 state_effec… 0.389 0.390 0.139 0.137 0.00919 2.36e-3 1.00 229.
## 34 state_effec… 0.373 0.371 0.0874 0.0871 0.00915 3.22e-3 1.00 91.4
## 35 state_effec… 0.0570 0.0819 0.204 0.179 0.00784 4.55e-3 1.00 600.
## 36 state_effec… -0.689 -0.691 0.129 0.130 0.0102 2.71e-3 1.01 160.
## 37 state_effec… 0.0488 0.0465 0.0982 0.0985 0.00845 2.85e-3 1.00 135.
## 38 state_effec… -0.0163 -0.0166 0.142 0.139 0.00935 2.27e-3 1.00 233.
## 39 state_effec… -0.174 -0.178 0.257 0.285 0.0119 4.29e-3 1.00 469.
## 40 state_effec… 0.0803 0.0762 0.0853 0.0867 0.0107 4.26e-3 1.01 63.9
## 41 state_effec… 0.384 0.388 0.159 0.149 0.00902 3.20e-3 1.00 301.
## 42 state_effec… -0.252 -0.253 0.124 0.124 0.0108 3.90e-3 1.00 132.
## 43 state_effec… 0.289 0.286 0.0929 0.0947 0.00986 2.77e-3 1.00 88.5
## 44 state_effec… -0.722 -0.726 0.177 0.173 0.00672 2.47e-3 1.00 695.
## 45 state_effec… -0.344 -0.348 0.0881 0.0890 0.0102 3.47e-3 1.00 75.5
## 46 state_effec… -0.00517 -0.00792 0.0953 0.0962 0.00965 3.09e-3 1.00 97.4
## 47 state_effec… -0.231 -0.228 0.121 0.122 0.0102 2.88e-3 1.00 140.
## 48 state_effec… -0.299 -0.302 0.0994 0.102 0.00986 2.65e-3 1.00 102.
## 49 state_effec… -0.326 -0.330 0.125 0.123 0.00809 2.42e-3 1.00 240.
## 50 state_effec… 0.864 0.856 0.157 0.155 0.00832 3.16e-3 1.01 366.
## 51 state_effec… 0.101 0.0998 0.182 0.177 0.00770 2.27e-3 1.00 555.
## 52 state_effec… 0.517 0.517 0.126 0.125 0.00990 3.05e-3 1.00 163.
## 53 state_effec… 0.317 0.312 0.0867 0.0891 0.0101 3.25e-3 1.00 74.4
## 54 state_effec… -0.461 -0.467 0.145 0.140 0.00995 4.16e-3 1.00 214.
## 55 state_effec… -0.800 -0.819 0.230 0.211 0.00550 2.96e-3 1.00 1624.
## 56 state_effec… -0.204 -0.203 0.103 0.104 0.00916 2.40e-3 1.00 127.
## 57 state_effec… -0.166 -0.169 0.0881 0.0894 0.00995 3.50e-3 1.00 79.0
## 58 state_effec… -0.117 -0.117 0.134 0.134 0.00805 2.29e-3 1.00 276.
## 59 state_effec… -0.359 -0.361 0.100 0.0995 0.00991 3.39e-3 1.00 103.
## 60 state_effec… -0.227 -0.224 0.197 0.197 0.00578 1.89e-3 1.00 1157.
## # ℹ 1 more variable: ess_tail <dbl>
The summary of the Bayesian model parameters reveals key insights
into the relationships between the predictors and the target variable.
The mean coefficient for
Families_2Parents (beta[2]) is
approximately 0.252, indicating a positive association with the target,
while Employed (beta[1]) has a mean of
about -0.055, suggesting a slight negative relationship.
The standard deviations (sd) reflect the precision
of these estimates. For instance, Employed
has a low sd of 0.019, indicating high precision, whereas
Families_2Parents has a higher sd of
0.202, reflecting more variability.
Monte Carlo Standard Error (MCSE) values are low
across parameters, with Employed having an
MCSE of 0.000849, reinforcing confidence in the estimates. The
R-hat values, close to 1 for most coefficients,
indicate good convergence of the MCMC chains, suggesting stable
posterior distributions.
In summary, this Bayesian model shows reliable estimates and
convergence, enhancing understanding of covariate relationships.
However, variability in estimates for
Families_2Parents and
Employed suggests a need for further
investigation to validate the model’s conclusions.
For the frequentist analysis, we will use a Generalized Linear Mixed Model (GLMM). This model is suitable because, like the Bayesian hierarchical model, it can handle nested or grouped data and incorporate random effects to account for variability between different levels of the hierarchy. Using a GLMM allows for a direct comparison with the Bayesian approach, enhancing the robustness of the analysis of the influence of social norms and community interactions on crime rates.
## Family: beta ( logit )
## Formula:
## bc_target_bayes ~ log_Below_Poverty + Illegitimate_Births + Large_Families +
## Inc_from_inv + log_Median_Income + Families_2Parents + Teen_2Par +
## Working_mom + Welfare_Public_Assist + Illegitimate_Births_Welfare_Public_Assist +
## log_Median_Below_Poverty + Large_Families_log_Below_Poverty +
## Welfare_log_Below_Poverty + Teen_2Par_log_Below_Poverty + (1 | State)
## Data: data
##
## AIC BIC logLik deviance df.resid
## -1117.5 -1033.5 575.8 -1151.5 1021
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## State (Intercept) 0.103 0.3209
## Number of obs: 1038, groups: State, 44
##
## Dispersion parameter for beta family (): 10.6
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.21158 0.10527 2.010 0.044454
## log_Below_Poverty -0.19303 0.17340 -1.113 0.265615
## Illegitimate_Births 0.20653 0.03466 5.959 2.54e-09
## Large_Families 0.17560 0.04959 3.541 0.000398
## Inc_from_inv -0.10812 0.03719 -2.907 0.003651
## log_Median_Income -0.04780 0.20252 -0.236 0.813409
## Families_2Parents -0.45452 0.05740 -7.919 2.40e-15
## Teen_2Par -0.12298 0.06357 -1.934 0.053057
## Working_mom -0.06560 0.02540 -2.583 0.009797
## Welfare_Public_Assist -0.09346 0.06593 -1.418 0.156329
## Illegitimate_Births_Welfare_Public_Assist -0.05557 0.02287 -2.430 0.015100
## log_Median_Below_Poverty 0.20875 0.28898 0.722 0.470071
## Large_Families_log_Below_Poverty -0.14589 0.07390 -1.974 0.048372
## Welfare_log_Below_Poverty 0.06627 0.08911 0.744 0.457065
## Teen_2Par_log_Below_Poverty 0.16415 0.08479 1.936 0.052878
##
## (Intercept) *
## log_Below_Poverty
## Illegitimate_Births ***
## Large_Families ***
## Inc_from_inv **
## log_Median_Income
## Families_2Parents ***
## Teen_2Par .
## Working_mom **
## Welfare_Public_Assist
## Illegitimate_Births_Welfare_Public_Assist *
## log_Median_Below_Poverty
## Large_Families_log_Below_Poverty *
## Welfare_log_Below_Poverty
## Teen_2Par_log_Below_Poverty .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The model identifies Families_2Parents,
Illegitimate_Births, Inc_from_inv,
Working_mom, and the interaction term
Illegitimate_Births_Welfare_Public_Assist as significant
predictors of the target variable. Families_2Parents has a
strong negative effect, indicating that higher proportions of two-parent
families are associated with a decrease in the target variable.
Illegitimate_Births shows a strong positive association,
while Inc_from_inv and Working_mom both have
significant negative effects. Additionally, the interaction term
Illegitimate_Births_Welfare_Public_Assist suggests that
higher levels of welfare assistance in conjunction with illegitimate
births are associated with a decrease in the target.
The model includes other income and demographic variables (such as
Large_Families, log_Below_Poverty, and various
interaction terms), but these are not statistically significant,
indicating that they do not have a strong or consistent impact on the
target variable in this dataset.
The random effect for State shows a moderate variance,
suggesting that there are differences in baseline levels across states,
which justifies the inclusion of state-level random effects to account
for this variability.
Based on the summary of the model, the relatively high dispersion parameter (at 10.6) and the presence of some predictors with large standard errors suggest the need for further diagnostics for zero-inflation and overdispersion. Conducting these tests will help ensure that the model is appropriately specified and that the parameter estimates are reliable.
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.8372, p-value < 2.2e-16
## alternative hypothesis: two.sided
##
## DHARMa zero-inflation test via comparison to expected zeros with
## simulation under H0 = fitted model
##
## data: simulationOutput
## ratioObsSim = NaN, p-value = 1
## alternative hypothesis: two.sided
Given that both tests indicate that the model does not suffer from zero-inflation or overdispersion, the current model specification appears appropriate for the data.
Subsequently, a plot of observed versus predicted crime rates was created to evaluate the model’s predictive performance:
The plot shows that the GLMMTMB model captures the general trend in the crime rate data, with predicted values generally increasing as observed crime rates rise. However, there is noticeable dispersion around the line of equality (the red line), especially at higher crime rates, where predictions tend to be more variable. This suggests that while the model provides a reasonable fit overall, it may benefit from further refinement to improve accuracy for cases with higher observed crime rates. Addressing this variability could enhance the model’s predictive performance in those areas.
Now we can compare the corresponding Bayesian model through the Mean Absolute Error (MAE) and **Root Mean Square Error (RMSE)* :
## RMSE for the glmmTMB model: 0.1234653
## MAE for the glmmTMB model: 0.09527117
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 1038
## Unobserved stochastic nodes: 2136
## Total graph size: 30904
##
## Initializing model
## RMSE for JAGS model: 0.1236841
## MAE for JAGS model: 0.09278549
The Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) are commonly used metrics to evaluate how closely a model’s predictions align with actual values, with RMSE being more sensitive to larger deviations. For the Bayesian model implemented in JAGS, the RMSE is approximately 0.1239, and the MAE is about 0.0927. These values suggest that the model’s predictions are generally close to the observed data, with a typical prediction error of around 0.0927. The slightly higher RMSE value reflects a few instances of larger errors, though these remain minimal.
In comparison, the frequentist glmmTMB model has an RMSE
of approximately 0.1233 and an MAE of 0.0954. This performance is also
strong, with an RMSE that is marginally lower than the JAGS model and an
MAE that is slightly higher. These values suggest that the
glmmTMB model’s predictions are similarly accurate and only
very slightly less consistent than the Bayesian model.
The small differences in these metrics indicate that both models
handle the data well, with only minor distinctions in predictive
accuracy. The JAGS model shows a slight advantage in MAE, which points
to a somewhat smaller average error per prediction, whereas the
glmmTMB model’s marginally lower RMSE suggests that it may
be slightly more robust against larger deviations.
In summary, both models achieve nearly identical predictive
performance, with the JAGS model having a very slight edge in average
error consistency (MAE) and the glmmTMB model performing
slightly better on larger deviations (RMSE). Given this close
performance, either model would be a reasonable choice, allowing for
selection based on specific analytical needs or preferences, such as the
Bayesian model’s ability to incorporate prior information or the
familiarity of the frequentist approach.
Model validation
The next logical step is to check whether the model violates any
assumptions. This is a crucial part of validating the model to ensure
that the inferences and predictions are reliable.
Here are the key
assumptions to check for a GLMM (Generalized Linear Mixed
Model):
The diagnostic plots suggest that the glmmTMB model
generally fits the data well, with assumptions largely being met.
In the Residuals vs. Fitted Values plot, residuals are symmetrically scattered around zero, indicating that the model captures the main data structure effectively. There is a slight increase in residual spread at higher fitted values, suggesting mild heteroscedasticity. This pattern is not extreme and may not require any adjustment, though a response transformation or robust standard errors could be considered if needed.
The Q-Q plot shows that residuals mostly follow the normality line, suggesting that the assumption of normality is broadly met. Minor deviations at the tails indicate a few outliers, but these are not substantial enough to impact the model’s validity significantly.
In summary, the model assumptions are reasonably well met. The slight heteroscedasticity and minor tail deviations are not severe, so the current model should be adequate. Further adjustments would only be necessary if you aim for a more refined fit, but they may yield only minimal improvement.
In this analysis, both a Bayesian hierarchical model
(implemented in JAGS) and a frequentist Generalized Linear Mixed
Model (GLMM) (using glmmTMB) were developed to
assess the effects of socio-economic factors on state-level crime rates.
Each approach offers distinct advantages suited to the dataset but also
shares a limitation in capturing all influences on crime rates, likely
due to unobserved factors or latent variables.
The Bayesian model demonstrates flexibility by incorporating prior information, which is particularly advantageous in managing data sparsity and noise. Its hierarchical structure allows for nuanced modeling of between-state variability, and posterior predictive checks indicated a strong overall fit to the observed data. However, further examination of density plots and residual patterns revealed systematic variations that the model could not entirely capture, suggesting the potential for hidden sub-group differences or unobserved heterogeneity. These findings indicate that a Bayesian mixture model might be beneficial, as it would enable better handling of multi-modal data, where certain socio-economic factors may influence crime differently across sub-populations.
In contrast, the frequentist GLMM avoids reliance on prior distributions, allowing for straightforward estimation and interpretation, which is practical in many applied settings. This model also showed a robust fit, with diagnostic checks revealing only mild issues in residual normality and heteroscedasticity. However, similar to the Bayesian model, the frequentist approach displayed some instability when predicting higher crime rates, which hints at the existence of latent factors not accounted for in the dataset. As with the Bayesian model, a frequentist mixture model could potentially address this by isolating unobserved sub-populations within the data.
Both models suggest that the dataset may not follow a single, unified distribution, likely due to the influence of sub-groups with distinct socio-economic characteristics that affect crime differently. This latent structure could represent community-level socio-cultural dynamics or unique structural inequalities, which remain unmeasured. Thus, both Bayesian and frequentist approaches would benefit from a mixture model to effectively capture multi-modal data patterns and unobserved sub-group effects.
In summary, while both the Bayesian and frequentist models offer valuable insights into the socio-economic determinants of crime, their inability to fully capture the complexity of the data points to an underlying multi-modal structure. The presence of unobserved, latent factors likely drives this multi-modality, reflecting socio-cultural or structural differences between communities that impact crime rates. A mixture model would allow each latent group’s unique characteristics to be incorporated, enhancing predictive accuracy and interpretability in both modeling frameworks.
In the Bayesian setting, a mixture model could leverage the hierarchical structure to account for inter-state variability while distinguishing latent sub-populations. For the frequentist approach, this addition would yield interpretable parameters for each sub-group, thereby improving the model’s performance on observations with high crime rates.
Overall, both models, by incorporating a mixture framework, would more accurately reflect the dataset’s underlying complexity and provide a more comprehensive understanding of the socio-economic and potentially cultural determinants driving crime across different communities.